-
Notifications
You must be signed in to change notification settings - Fork 270
speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f5e5bbd
to
9d475cb
Compare
9d475cb
to
ca1fef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR accelerates ViT inference by integrating optimized Triton kernels for gelu and rms norm operations, adds flash attention support for Hopper GPUs, and implements chunked prefill for multimodal scenarios. Key changes include:
- Enhancements to VisualModelRpcServer and model.encode to support per-image maximum patch counts via max_num_list.
- Updates in router, multimodal parameters, and memory cache logic to propagate and utilize a new max_num parameter.
- Integration of Triton kernels for gelu and rms norm, along with adjustments in backend and preprocessing for multimodal inputs.
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
lightllm/server/visualserver/model_infer/model_rpc.py | Propagates max_num_list to model.encode in forward for multimodal inference. |
lightllm/server/router/model_infer/model_rpc.py | Passes is_multimodal flag to chunked prefill backend. |
lightllm/server/router/model_infer/mode_backend/chunked_prefill/impl.py | Updates chunked prefill to accept is_multimodal parameter. |
lightllm/server/multimodal_params.py | Introduces max_num parameter and corresponding logic. |
lightllm/server/embed_cache/*.py | Adds new API for max_num and updates memory cache record structure. |
lightllm/server/api_http.py | Adjusts multimodal image processing and token counting. |
lightllm/models/vit/* | Modifies encode and layer inference functions to support new gelu/rms norm kernels. |
lightllm/models/internvl/* | Updates image token length calculations and preprocessing to include max_num. |
Comments suppressed due to low confidence (2)
lightllm/server/embed_cache/utils.py:16
- [nitpick] The parameter name 'img_str' is ambiguous because it may represent either a file path or a file-like stream. Consider renaming it to something that clearly indicates the expected input type, like 'image_input'.
def image2base64(img_str: str):
lightllm/server/api_http.py:251
- Passing 'response.raw' (a stream) to image2base64 assumes that the function can handle file-like objects. Verify and document the accepted input types for image2base64 or adjust its implementation accordingly.
data = image2base64(response.raw)
if self.tp_rank_id == 0: | ||
for i in range(len(images_uuids)): | ||
uid = images_uuids[i] | ||
max_num_list.append(self.cache_client.root.get_max_num(uid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, max_num_list is populated only when self.tp_rank_id == 0, which may result in an empty list for other ranks. Consider ensuring a consistent max_num_list is provided to self.model.encode for all cases.
max_num_list.append(self.cache_client.root.get_max_num(uid)) | |
max_num_list[i] = self.cache_client.root.get_max_num(uid) |
Copilot uses AI. Check for mistakes.
14b22e4
to
0c19cf6
Compare
0c19cf6
to
bb5b9f7
Compare
3f76f9e
to
52c6b99
Compare
52c6b99
to
339d98e
Compare
lightllm/server/multimodal_params.py
Outdated
@@ -21,6 +22,7 @@ def __init__(self, **kwargs): | |||
self.image_h = 0 | |||
|
|||
self._preload_data = None | |||
self.extra_params = {"image_patch_max_num": kwargs.get("max_num", None)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
通用性。
edb87de
to
9e3ae23
Compare
9e3ae23
to
01b3f68
Compare
cb7fd6d
to
1f25c14
Compare
1f25c14
to
ea0fe0d
Compare
No description provided.